Merge pull request #8 from fotoente/dev

Dev
This commit is contained in:
fotoente 2022-02-19 16:05:07 +01:00 committed by GitHub
commit c77a99fe5f
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
9 changed files with 201 additions and 138 deletions

9
.dockerignore Normal file
View file

@ -0,0 +1,9 @@
.git*
__pycache__
bot.cfg
README.md
LICENSE
markov.json
roboduck.db
docker-compose.yml
Dockerfile

26
.github/workflows/dockerhub.yml vendored Normal file
View file

@ -0,0 +1,26 @@
name: dockerhub
on:
push:
branches:
- main
jobs:
buildx:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Login to Docker Hub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKER_HUB_USERNAME }}
password: ${{ secrets.DOCKER_HUB_ACCESS_TOKEN }}
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Build and push
uses: docker/build-push-action@v2
with:
context: .
file: ./Dockerfile
push: true
tags: ${{ secrets.DOCKER_HUB_USERNAME }}/misskey-ebooks-bot:latest
platforms: linux/amd64,linux/arm64,linux/arm/v7,linux/arm/v6

11
Dockerfile Normal file
View file

@ -0,0 +1,11 @@
FROM python:3
WORKDIR /usr/src/app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
RUN pip install git+https://github.com/yupix/Mi.py.git@v3.3.0
COPY . .
CMD [ "python", "-u", "./rdbot.py" ]

View file

@ -3,19 +3,21 @@ Misskey eBooks Bot with Markov Chain
[Example @roboduck@ente.fun](https://ente.fun/@roboduck)
### Introduction
## Introduction
This small python script is a Markov Chain eBooks bot based on the framework of [mi.py](https://github.com/yupix/Mi.py.git)
It can only read and write from and to Misskey. Reading from Mastodon or Pleroma is not (yet) implemented.
It posts every hour on his own and reacts to mentions. Every 12 hours the bot reloads the notes and recalculates the Markov Chain.
### Operating mode
## Operating mode
On the first start up the bot loads a given number of posts into his database and calculates the Markov Chain out of it.
After this he only updates the database with new posts. The upgrading is threaded so the bot itself isn't interrupted while the new markov chain is calulated.
### Installation
To run `mi.py` you must isntall `python3.9` and `python3.9-dev` onto your system. (Please be aware of the requirements for mi.py!)
## Installation
### Host Installation
To run `mi.py` you must install `python3.9` and `python3.9-dev` onto your system. (Please be aware of the requirements for mi.py!)
`mi.py` is still under development and a lot of things change there quickly so please be aware that there could be chances that something changed, that I haven't implemented in the bot at the moment.
to install `mi.py`please use the following command.
`pip install git+https://github.com/yupix/Mi.py.git`
@ -28,9 +30,26 @@ configparser
or just use the command `pip install -r requirements.txt` in the local folder where you cloned the repo.
Before starting the bot, please copy `example-bot.cfg` to `bot.cfg` and
configure it according to the configuration section below.
### Configuration
To run the bot please edit `example-bot.cfg` and rename it to `bot.cfg`
The best way to run it would be a `systemd` unit file and run it as a deamon.
Just to test it you can use `nohup python3.9 rdbot.py &` in the directory the bot is located in.
### Docker
To host this image with docker, copy the `docker-compose.yml` file to the directory that you want to host it from.
Next, you'll need to copy the contents of `example-bot.cfg` to `bot.cfg` in the
same directory and configure it according to the configuration section below.
Run `touch markov.json roboduck.db` in order to create the markov and database
files before starting the docker container. These files must already exist
before starting the docker container.
Then, simply run `docker-compose up` to start the app, or `docker-compose up -d`
to start the bot in detached mode!
## Configuration
Following things can be edited:
|Name|Values|Explanation|
|----|----|----|
@ -52,12 +71,6 @@ Following things can be edited:
You can change the configuration while the bot is running. No restart necessary, they take immediate effect.
### Starting the Bot
The best way to run it would be a `systemd` unit file and run it as a deamon.
Just to test it you can use `nohup python3.9 rdbot.py &` in the directory the bot is located in.
### Known Quirks
## Known Quirks
- The startup needs quite some time. On my system about 10 seconds. You knwo that everything runs well when the first Note is posted.
- When the bot is started, it could happen that he runs in a timeout in the first 600 seconds. To prevent that, just mention the bot and he will stay in a loop.
## Works on my machine!

12
docker-compose.yml Normal file
View file

@ -0,0 +1,12 @@
version: "3"
services:
misskey-ebooks-bot:
build:
context: ./
container_name: misskey-ebooks-bot
restart: always
volumes:
- ./bot.cfg:/usr/src/app/bot.cfg
- ./roboduck.db:/usr/src/app/roboduck.db
- ./markov.json:/usr/src/app/markov.json

View file

@ -6,6 +6,7 @@ import mi
import sys
import configparser
import threading
from pathlib import Path
from mi import Note
from mi.ext import commands, tasks
from mi.note import Note
@ -16,22 +17,22 @@ from roboduck import *
#Load Misskey configuration
config = configparser.ConfigParser()
config.read(os.path.join(os.path.dirname(__file__), 'bot.cfg'))
config.read((Path(__file__).parent).joinpath('bot.cfg'))
uri="wss://"+config.get("misskey","instance_write")+"/streaming"
token=config.get("misskey","token")
class MyBot(commands.Bot):
text_model = None #Holds the markov object, so it won't be recreated everytime
def __init__(self):
super().__init__()
super().__init__()
@tasks.loop(3600)
async def loop_1h(self):
text = create_sentence()
await bot.client.note.send(content=text)
@tasks.loop(43200)
async def loop_12h(self):
thread_update = threading.Thread(target=update)
@ -44,8 +45,8 @@ class MyBot(commands.Bot):
self.loop_12h.start() #Launching renew posts every 12 hours
self.loop_1h.start() #
print(datetime.now().strftime('%Y-%m-%d %H:%M:%S')+" Roboduck Bot started!")
async def on_mention(self, note: Note):
text=""
if (not note.author.is_bot):
@ -54,17 +55,17 @@ class MyBot(commands.Bot):
text = "@" + note.author.name + " " #Building the reply on same instance
else:
text = "@" + note.author.name + "@" + note.author.host + " " #Building the reply on foreign instance
text += create_sentence()
await note.reply(content=text) #Reply to a note
if __name__ == "__main__":
if (not os.path.exists(os.path.join(os.path.dirname(__file__), 'roboduck.db'))):
databasepath = (Path(__file__).parent).joinpath('roboduck.db')
if (not (os.path.exists(databasepath) and os.stat(databasepath).st_size != 0)):
init_bot()
bot = MyBot()
asyncio.run(bot.start(uri, token, timeout=600))

View file

@ -1,2 +1,6 @@
markovify
configparser
configparser
ujson
requests
msgpack
regex

View file

@ -2,10 +2,11 @@ import requests
import json
import os
import sys
import re
import regex
import configparser
import markovify
import sqlite3
from pathlib import Path
from datetime import *
from time import sleep
@ -16,25 +17,25 @@ def check_str_to_bool(text) -> bool:
return False
else:
return True
def get_notes(**kwargs):
noteid = "k"
sinceid = ""
min_notes = 0
notesList = []
returnList = []
if (kwargs):
if ("min_notes" in kwargs):
#print("min_notes found!")
init = True
min_notes = kwargs["min_notes"]
elif ("lastnote" in kwargs):
#print("Lastnote found!")
init = False
sinceid = kwargs["lastnote"]
else:
print("Wrong arguments given!")
print("Exiting routine!")
@ -43,54 +44,49 @@ def get_notes(**kwargs):
print("No arguments given!")
print("Exiting routine")
return None
#Load configuration
config = configparser.ConfigParser()
config.read(os.path.join(os.path.dirname(__file__), 'bot.cfg'))
#print(os.path.join(os.path.dirname(__file__), 'bot.cfg'))
url="https://"+config.get("misskey","instance_read")+"/api/users/show"
if (config.get("misskey","instance_read") == config.get("misskey","instance_write")):
host=None
else:
host=config.get("misskey","instance_read")
host=config.get("misskey","instance_read")
try:
req = requests.post(url, json={"username" : config.get("misskey","user_read"), "host" : host})
req.raise_for_status()
except requests.exceptions.HTTPError as err:
print("Couldn't get Username! " + str(err))
sys.exit(1)
userid = req.json()["id"]
#Read & Sanitize Inputs from Config File
#Read & Sanitize Inputs from Config File
try:
includeReplies = check_str_to_bool(config.get("markov","includeReplies"))
except (TypeError, ValueError) as err:
includeReplies = True
try:
includeMyRenotes = check_str_to_bool(config.get("markov","includeMyRenotes"))
except (TypeError, ValueError) as err:
includeMyRenotes = False
try:
excludeNsfw = check_str_to_bool(config.get("markov","excludeNsfw"))
except (TypeError, ValueError) as err:
excludeNsfw = True
run = True
oldnote=""
while run:
if ((init and len(notesList) >= min_notes) or (oldnote == noteid)):
break
try:
try:
req = requests.post("https://"+config.get("misskey","instance_read")+"/api/users/notes", json = {
"userId": userid,
"includeReplies" : includeReplies,
@ -100,172 +96,167 @@ def get_notes(**kwargs):
"excludeNsfw" : excludeNsfw,
"untilId" : noteid,
"sinceId" : sinceid
})
})
req.raise_for_status()
except requests.exceptions.HTTPError as err:
print("Couldn't get Posts! "+str(err))
sys.exit(1)
for jsonObj in req.json():
notesList.append(jsonObj)
notesList.append(jsonObj)
if (len(notesList) == 0):
print("No new notes to load!")
return 0
oldnote = noteid
noteid = notesList[len(notesList)-1]["id"]
print(str(len(notesList)) + " Notes read.")
print("Processing notes...")
for element in notesList:
lastTime = element["createdAt"]
lastTimestamp = int(datetime.timestamp(datetime.strptime(lastTime, '%Y-%m-%dT%H:%M:%S.%f%z'))*1000)
content = element["text"]
if content is None: #Skips empty notes (I don't know how there could be empty notes)
continue
content = re.sub(r"@([a-zA-Z0-9-]*(\.))*[a-zA-Z0-9-]*\.[a-zA-z]*", '', content) #Remove instance name with regular expression
content = regex.sub(r"(?>@(?>[\w\-])+)(?>@(?>[\w\-\.])+)?", '', content) #Remove instance name with regular expression
content = content.replace("::",": :") #Break long emoji chains
content = content.replace("@", "@"+chr(8203))
dict = {"id" : element["id"], "text" : content, "timestamp" : lastTimestamp}
returnList.append(dict)
return returnList
def calculate_markov_chain():
text = ""
#Load configuration
config = configparser.ConfigParser()
config.read(os.path.join(os.path.dirname(__file__), 'bot.cfg'))
config.read((Path(__file__).parent).joinpath('bot.cfg'))
try:
max_notes = config.get("markov","max_notes")
except (TypeError, ValueError) as err:
max_notes = "10000"
databasepath = os.path.join(os.path.dirname(__file__), 'roboduck.db')
if (not os.path.exists(databasepath)):
print("Roboduck database already created!")
databasepath = (Path(__file__).parent).joinpath('roboduck.db')
if (not (os.path.exists(databasepath) and os.stat(databasepath).st_size != 0)):
print("Roboduck database not already created!")
print("Exit initialization!")
sys.exit(0)
with open(databasepath, 'r', encoding='utf-8') as emojilist:
database = sqlite3.connect(databasepath)
data = database.cursor()
data.execute("SELECT text FROM notes ORDER BY timestamp DESC LIMIT " + max_notes + ";")
rows = data.fetchall()
for row in rows:
text += row[0] + "\n"
markovchain = markovify.Text(text)
markovchain.compile(inplace = True)
markov_json = markovchain.to_json()
with open((os.path.join(os.path.dirname(__file__), 'markov.json')), "w", encoding="utf-8") as markov:
with open((Path(__file__).parent).joinpath('markov.json'), "w", encoding="utf-8") as markov:
json.dump(markov_json, markov)
def clean_database():
databasepath = os.path.join(os.path.dirname(__file__), 'roboduck.db')
if (not os.path.exists(databasepath)):
databasepath = (Path(__file__).parent).joinpath('roboduck.db')
if (not (os.path.exists(databasepath) and os.stat(databasepath).st_size != 0)):
print("No database found!")
print("Please run Bot first!")
sys.exit(0)
with open(databasepath, "a", encoding="utf-8") as f:
database = sqlite3.connect(databasepath)
#Reading config file bot.cfg with config parser
config = configparser.ConfigParser()
config.read(os.path.join(os.path.dirname(__file__), 'bot.cfg'))
#print(os.path.join(os.path.dirname(__file__), 'bot.cfg'))
config.read((Path(__file__).parent).joinpath('bot.cfg'))
#print((Path(__file__).parent).joinpath('bot.cfg'))
try:
max_notes = config.get("markov","max_notes")
except (TypeError, ValueError) as err:
max_notes = "10000"
data = database.cursor()
data.execute("DELETE FROM notes WHERE id NOT IN (SELECT id FROM notes ORDER BY timestamp DESC LIMIT " + max_notes + ");")
database.commit()
database.close()
def create_sentence():
with open((os.path.join(os.path.dirname(__file__), 'markov.json')), "r", encoding="utf-8") as markov:
with open((os.path.join((Path(__file__).parent), 'markov.json')), "r", encoding="utf-8") as markov:
markov_json = json.load(markov)
text_model = markovify.Text.from_json(markov_json)
note=""
#Reading config file bot.cfg with config parser
config = configparser.ConfigParser()
config.read(os.path.join(os.path.dirname(__file__), 'bot.cfg'))
#print(os.path.join(os.path.dirname(__file__), 'bot.cfg'))
config.read((Path(__file__).parent).joinpath('bot.cfg'))
#print((Path(__file__).parent).joinpath('bot.cfg'))
#Read & Sanitize Inputs
try:
test_output = check_str_to_bool(config.get("markov","test_output"))
except (TypeError, ValueError) as err:
#print("test_output: " + str(err))
test_output = True
if (test_output):
try:
tries = int(config.get("markov","tries"))
except (TypeError, ValueError) as err:
#print("tries: " + str(err))
tries = 250
try:
max_overlap_ratio = float(config.get("markov","max_overlap_ratio"))
except (TypeError, ValueError) as err:
#print("max_overlap_ratio: " + str(err))
max_overlap_ratio = 0.7
try:
max_overlap_total = int(config.get("markov","max_overlap_total"))
except (TypeError, ValueError) as err:
#print("max_overlap_total: " + str(err))
max_overlap_total = 10
try:
max_words = int(config.get("markov","max_words"))
except (TypeError, ValueError) as err:
#print("max_words: " + str(err))
max_words = None
try:
min_words = int(config.get("markov","min_words"))
except (TypeError, ValueError) as err:
#print("min_words: " + str(err))
min_words = None
if (max_words is not None and min_words is not None):
if (min_words >= max_words):
#print("min_words ("+str(min_words)+") bigger than max_words ("+str(max_words)+")! Swapping values!")
swap = min_words
min_words = max_words
max_words = swap
else:
tries = 250
max_overlap_ratio = 0.7
max_overlap_total = 15
max_words = None
min_words = None
"""
#Debug section to rpint the used values
print("These values are used:")
@ -276,7 +267,7 @@ def create_sentence():
print("max_words: " + str(max_words))
print("min_words: " + str(min_words))
"""
#Applying Inputs
note = text_model.make_sentence(
test_output = test_output,
@ -290,76 +281,73 @@ def create_sentence():
return note
else:
return "Error in markov chain sentence creation: Couldn't calculate sentence!\n\n☹ Please try again! "
def update():
notesList = []
databasepath = os.path.join(os.path.dirname(__file__), 'roboduck.db')
if (not os.path.exists(databasepath)):
databasepath = (Path(__file__).parent).joinpath('roboduck.db')
if (not (os.path.exists(databasepath) and os.stat(databasepath).st_size != 0)):
print("No database found!")
print("Please run Bot first!")
sys.exit(0)
with open(databasepath, "a", encoding="utf-8") as f:
database = sqlite3.connect(databasepath)
print("Connected to roboduck.db succesfull...")
data = database.cursor()
data.execute("SELECT id FROM notes WHERE timestamp = (SELECT MAX(timestamp) FROM notes);")
sinceNote = data.fetchone()[0]
notesList = get_notes(lastnote = sinceNote)
if (notesList == 0):
database.close()
return
print("Insert new notes to database...")
for note in notesList:
database.execute("INSERT OR IGNORE INTO notes (id, text, timestamp) VALUES(?, ?, ?)", [note["id"], note["text"], note["timestamp"]])
database.commit()
database.commit()
print("Notes updated!")
database.close()
print("Cleaning database...")
clean_database()
print("Database cleaned!")
print("Short sleep to prevent file collison...")
sleep(10)
print("Calculating new Markov Chain...")
calculate_markov_chain()
print("Markov Chain saved!")
print("\nUpdate done!")
def init_bot():
notesList = []
databasepath = os.path.join(os.path.dirname(__file__), 'roboduck.db')
if (os.path.exists(databasepath)):
databasepath = (Path(__file__).parent).joinpath('roboduck.db')
if (os.path.exists(databasepath) and os.stat(databasepath).st_size != 0):
print("Roboduck database already created!")
print("Exit initialization!")
sys.exit(0)
print("Creating database...")
with open(databasepath, "w+", encoding="utf-8") as f:
database = sqlite3.connect(databasepath)
print("Connected to roboduck.db succesfull...")
print("Creating Table...")
database.execute("CREATE TABLE notes (id CHAR(10) PRIMARY KEY, text CHAR(5000), timestamp INT);")
print("Table NOTES created...")
#Load configuration
config = configparser.ConfigParser()
config.read(os.path.join(os.path.dirname(__file__), 'bot.cfg'))
config.read((Path(__file__).parent).joinpath('bot.cfg'))
try:
initnotes = int(config.get("markov","min_notes"))
except (TypeError, ValueError) as err:
@ -367,22 +355,21 @@ def init_bot():
initnotes=1000
print("Try reading first " + str(initnotes) + " notes.")
notesList = get_notes(min_notes = initnotes)
print("Writing notes into database...")
for note in notesList:
database.execute("INSERT INTO notes (id, text, timestamp) VALUES(?, ?, ?)", [note["id"], note["text"], note["timestamp"]])
database.commit()
database.commit()
database.close()
print("Notes written...")
print("Creating Markov Chain")
calculate_markov_chain()
print("Markov Chain calculated & saved.\n")
print("Finished initialization!\n")
print("The bot will now be started!")

View file

@ -1,3 +1,3 @@
from roboduck import *
update()
update()