A Tool to Import Evernote notes to Anytype

Shampra · March 23, 2024, 8:37pm

?
The converter doesn’t use test files unless you set “–test” as parameter.
Are you using the latest release or source code?

I imported around 300 notes to validate the operation: all the discrepancies detected are documented on the Github (or almost, there are also links detected incorrectly).

If your notes are complete HTML pages (e.g. notes created via the webclipper), this is already noted in the list. Nothing more to do, I’ll look into it when I have time.

If it’s not that, I have not this problem, I need an enex example to reproduce the bug.
If you can find a short note, and anonimize it (or find the enex part that’s causing the problem and share it), I could take a look

Like I said, it’s work for me, except bugs already reported in Github.
Enex contains HTML, so my converter parse this HTML .
Github is there to share, but also to enable discussions and contributions: if you have a suggestion, don’t hesitate to post it.
As for the bugs you’ve fixed, it’s better to share them, isn’t it?

UBr · March 25, 2024, 8:59pm

Thank you, this is very welcome and will certainly help many people who want to trust a reliable way to migrate from Evernote to Anytype!

I used the source code that was available on GitHub on March 14.

Only a small fraction of my notes are clipped HTML pages. Most of the discrepancies I found seem root in Evernote’s way to internally encode notes in varying styles of HTML over the years, instead of e.g. Markdown (my oldest notes in Evernote date back to 2009).

I am neither a programmer nor familiar with GitHub, therefore I simply downloaded the source code and tried to let it run on my Mac. I had manually compared several hundred imported notes in Anytype with the original ENEX files, and took notes of major findings, in order to share them with you.

Unfortunately, as said, I was interrupted at this but I’ll give my best in the next couple of days to find a way how to anonymize the most troublesome ENEX files, and share them with you.

UBr · March 25, 2024, 9:46pm

Let me start with how I fixed the bugs that threw exceptions, dear @Shampra. I fixed and documented them in my copy of your March-14 source code that you find below:

line 64 [Errno 63] File name too long => truncated to :120 in line 66: filename = filename[:120] in below source code: filename = filename.replace(char, '_')

line 88: cssValue is deprecated => set output to fixed value of 16 in return int(cssutils.css.CSSValue(value_str).value * 16)

line 201: ValueError: invalid literal for int() with base 10: ‘0.000000%’ in return tuple(int(x) for x in rgb.split(",")) => applied an elegant solution I found in python - What is a clean way to convert a string percent to a float? - Stack Overflow
return tuple(int(float(x.replace('%', 'e-2'))) for x in rgb.split(","))

Line 671: ValueError: could not convert string to float: ‘auto’ in relative_width = float(embed_width.replace("px", "")) / original_width=> added try: ... except ... relative_with = 1

line 792: XML parsing error: out of memory on a Mac M1 with 16GB of RAM for an ENEX file with 6483 notes in for note_xml in root.iter("note"): => no fix in the source code, but split the ENEX file into 6483 individual ENEX files with 1 note each.

UBr · March 25, 2024, 10:10pm

Here’s the source code with all the above fixes to converter.py of March 14. I give it to you in two chunks to circumvene this forum’s 32000 character limit. The first chunk has line 1–201, the second chunk has lines 623–872:

First chunk, line 1–201:

# import pdb
import argparse
import shutil
from bs4 import BeautifulSoup
import json
import random
import xml.etree.ElementTree as ET
from scipy.spatial import cKDTree
import hashlib
import os
import base64
import re
from datetime import datetime
import time
from typing import List, Type
import logging
import inspect
import cssutils


from models.language_patterns import language_patterns
import models.mime, models.json_model as Model
from models.options import Options
import warnings

# Ignore les avertissements de BeautifulSoup
warnings.filterwarnings("ignore", category=UserWarning, module="bs4")

# Déclarer options en tant que variable globale
my_options = Options()
my_options.is_debug = False


# Configurer le logging
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(levelname)-8s - %(funcName)-2s l.%(lineno)d - %(message)s',
    handlers=[
        logging.FileHandler("debug.log")
    ]
)
logger = logging.getLogger(__name__)


def log_debug(message: str, level: int = logging.DEBUG):
    if my_options.is_debug:
        if level >= logging.DEBUG:
            caller_frame = inspect.stack()[1]
            caller_func = caller_frame[3]
            caller_lineno = caller_frame[2]
            logger.log(level, f"{caller_func} l.{caller_lineno} - {message}")
        elif 'TERM_PROGRAM' in os.environ.keys() and os.environ['TERM_PROGRAM'] == 'vscode': # debug en mode dev
            print(message)
            pass
            # logger.log(level, message)
    if level > logging.DEBUG:
        print(message)
    
        

def sanitize_filename(filename):
    invalid_chars = '/\\?%*:|"<>'
    for char in invalid_chars:
        filename = filename.replace(char, '_')
        # otherwise line 811 OSError: [Errno 63] File name too long: 
        filename = filename[:120]
    return filename


def generate_random_id(length = 24):
    """Génère un identifiant aléatoire en hexadécimal de la longueur spécifiée""" 
    
    hex_chars = '0123456789abcdef'
    id = ''.join(random.choice(hex_chars) for _ in range(length))
    return id


def extract_shifting_left(div):
    """Extrait la valeur de margin_left ou padding-left"""
    style = div.get('style')
    if style:
        style_properties = style.split(';')
        for prop in style_properties:
            if 'margin-left' in prop or 'padding-left' in prop:
                value_str = prop.split(':')[1].strip()
                if 'em' in value_str:
                    # deprecated:
                    # return int(cssutils.css.CSSValue(value_str).value * 16)
                    return 16
                elif 'px' in value_str:
                    # gives error:
                    # return int(value_str.replace('px', '').strip())
                    return 16
                else:
                    return 16
    return 0

def extract_tag_info(contenu_div, tags_list):
    """Extract info about tag in this content

    Args:
        contenu_div (_type_): _description_
        tags_list (list): list of tag to treat

    Returns:
        list: {
            'tag_object': tag,
            'text': text into this tag,
            'start': starting position in contenu_div,
            'end': end position in contenu_div
        }
    """    
    # Analyser le contenu HTML
    
    ######### Tag avec tout
    
    # On recréé un objet soup à partir de l'objet tag transmis
    contenu_str = str(contenu_div)
    soup = BeautifulSoup(str(contenu_div), 'html.parser')
    # TODO : voir si on peut transmettre un soup plutôt que tag?
    
    # Initialiser une liste pour stocker les informations des balises
    tags_info = []

    for tag in soup.find_all(tags_list):     
        text = tag.get_text()

        # On compte le nombre de caractères du texte de contenu_div jusqu'à la position de la balise
        content_to_count = contenu_str[0:tag.sourcepos]
        soup_to_count = BeautifulSoup(content_to_count, 'html.parser')
        start = len(soup_to_count.get_text())
        end = start + len(text)

        log_debug(f"--- 'tag_name': {tag.name}, 'text': {text}, 'start': {start}, 'end': {end}, 'tag position' : {tag.sourcepos}", logging.NOTSET)
        # Ajouter les informations de la balise à la liste
        tags_info.append({
            'tag_object': tag,
            'text': text,
            'start': start,
            'end': end
        })
    return tags_info


def extract_styles(style_string):
    """Génère un dictionnaire des styles à partir de l'attribut style

    Args:
        style_string (string): contenu de l'attribut string

    Returns:
        dic: contient les couples style CSS->valeur
    """
    style_dict = {}
    if style_string:
        # Cas particuliers à gérer :
        ## href:https://example.com
        ## et les background:(...) url(&quot;data:image/svg+xml;base64,PHN2ZyB3(...)
        style_pairs = re.findall(r'([^:]+):([^;]+);', style_string)
        for key, value in style_pairs:
                style_dict[key.strip()] = value.strip()
            
    return style_dict


# couleur AT suivant le RGB Evernote
def extract_color_from_style(style):
    """Transform RGB or Hexa color from Evernote to a AT color (limited)

    Args:
        style (string): "rgb()" or "#xxxxxx"

    Returns:
        string: color name
    """
    colors = {
        "grey": (182, 182, 182),
        "yellow": (236, 217, 27),
        "orange": (255, 181, 34),
        "red": (245, 85, 34),
        "pink": (229, 28, 160),
        "purple": (171, 80, 204),
        "blue": (62, 88, 235),
        "ice": (42, 167, 238),
        "teal": (15, 200, 186),
        "lime": (93, 212, 0),
    }
    # For background-color, map 1 to 1
    EN_bck_color ={
        "yellow": (255, 239, 158),
        "orange": (255, 209, 176),
        "red": (254, 193, 208),
        "purple": (203, 202, 255),
        "blue": (176, 236, 244),
        "lime": (183, 247, 209),
        "black": (51,51,51) # couleur mise automatiquement dans certains cas sous EN (pas de black sur AT mais ça sera ignoré)
    }

    def rgb_to_tuple(rgb):
        # gives error for x='0.000000%'
        # return tuple(int(x) for x in rgb.split(","))
        # very elegant solution by https://stackoverflow.com/a/69308473
        return tuple(int(float(x.replace('%', 'e-2'))) for x in rgb.split(","))

Second chunk, lines 623–end:

def process_div_children(div, page_model: Model.Page, files_dict, cell_id=None):
    """_summary_

    Args:
        div (_type_): _description_
        page_model (Model.Page): _description_
        files_dict (_type_): _description_
        table (bool, optional): Indicate if it's a loop for a table or the default treatment. Defaults to False.
    """
    log_debug(f"- Converting childrens...", logging.DEBUG)
    # Définition des balises block à traiter
    balisesBlock = ['div', 'hr', 'br', 'h1', 'h2', 'h3','en-media','table']
    children = div.find_all(balisesBlock)
    for child in children:
        # élément d'une table, on passe car tous les éléments sont à traiter dans la table (div, media, ...)
        if not cell_id and child.find_parent('td'):
            continue;
        # Traitement d'une cellule, l'ID est déjà défini
        if cell_id:
            div_id = cell_id
        else:
            div_id = generate_random_id()
        shifting_left = extract_shifting_left(child)
        div_text = extract_top_level_text(child)
        div_tag = child.get('id')

        # On commence par les blocs sans texte
        if child.name == 'hr':
            page_model.add_block(div_id, shifting=shifting_left)
            page_model.edit_block_key(div_id, "div",{})
        elif child.name == 'br':
            page_model.add_block(div_id, shifting=shifting_left, text = "")
        # Traitement des fichiers à intégrer
        elif child.name == 'en-media':
            hash = child.get('hash')
            if hash in files_dict:
                sanitized_filename, mime, file_size, file_type = files_dict[hash]
                # Redimensionné? Il faut retourner width="340px" divisé par style="--en-naturalWidth:1280"  style="--en-naturalWidth:1280; --en-naturalHeight:512;" width="340px" />
                text_style = child.get('style')
                styles = extract_styles(text_style) if text_style else {}
                
                embed_width = child.get('width')
                original_width = int(styles.get("--en-naturalWidth", "0"))
                
                relative_width = None  
                if embed_width is not None and original_width is not None and original_width != 0:
                    # ValueError: could not convert string to float: 'auto'
                    try:
                        relative_width = float(embed_width.replace("px", "")) / original_width
                    except Exception as e:
                        log_debug(f"Error with float(embed_width): {e}", logging.ERROR)
                        relative_width = 1
                        continue
                # Format lien? 
                style_attr = child.get('style')
                format = 'link' if style_attr and '--en-viewAs:attachment;' in style_attr else None
                page_model.add_block(div_id, shifting=shifting_left)
                page_model.add_file_to_block(div_id, hash = hash, name = sanitized_filename, file_type = file_type, mime = mime, size = file_size, embed_size = relative_width, format=format )
            
                # TODO : quand AnyType permettra l'import des fichiers     
                       
        # Traitement bloc code (div racine sans texte)
        elif child.name == 'div' and 'style' in child.attrs and '--en-codeblock:true' in child['style']:
                process_codeblock(child, div_id, page_model)
        #Traitement table
        elif child.name == 'table':
            process_table(child, page_model)
        # Traitement des blocs demandant du contenu texte
        elif div_text:
            # les div enfant des blocs codes doivent être exclues du traitement global
            parent_div = child.find_parent('div')
            if child.name == 'div' and parent_div and 'style' in parent_div.attrs and '--en-codeblock:true' in parent_div['style']:
                pass
            # Traitements spécifiques
            elif child.name in ['div', 'h1', 'h2', 'h3']:
                # Traitement spécifique pour les listes!
                parent_list = child.find_parent(['ol', 'ul'])
                if parent_list:
                    #Est-ce dans une liste imbriquée? 1ère étape pouvoir pouvoir placer le childrenIds!
                    # TODO : ajout imbrication à l'imbrication existante? Si padding = 40 et imbrication 40 : traiter comme 80?
                    #        A tester quels cas EN peut générer...
                    nested_level = len(parent_list.find_parents(['ol', 'ul']))
                    if nested_level > 0:
                        # On va traiter comme les blocs décalés...
                        shifting_left = 40 * (nested_level)
                
                # Puis on créé le bloc
                page_model.add_block(div_id, shifting=shifting_left)
                
                # Traitement texte
                extract_text_with_formatting(child, div_id, page_model)
                
                # Traitements styles du bloc
                style = extract_styles(child.get('style'))
                if 'padding-left' in style:
                    # Le traitement est déjà fait, on ne fait rien
                    pass
                elif '--en-codeblock' in style:
                    # Traitement à définir plus tard de tous les sous-blocs
                    pass
                elif 'text-align' in style:
                    if style['text-align'] == 'center':
                        page_model.edit_block_key(div_id,"align","AlignCenter")
                    elif style['text-align'] == 'right':
                        page_model.edit_block_key(div_id,"align","AlignRight")

                # Et style si c'est une liste
                if parent_list:
                    if parent_list.name == 'ol':
                        style_liste = 'Numbered'
                    elif parent_list.name == 'ul' and parent_list.has_attr('style') and '--en-todo:true' in parent_list['style']:
                        style_liste = 'Checkbox'
                        li_parent = child.find_parent('li')
                        if li_parent and li_parent.has_attr('style') and '--en-checked:true' in li_parent['style']:
                            page_model.edit_text_key(div_id,"checked",True)
                    else:
                        style_liste = 'Marked'
                    page_model.edit_text_key(div_id,"style",style_liste)
                
                # et style des titres
                if  child.name in ['h1', 'h2', 'h3']:
                    page_model.edit_text_key(div_id,"style","Header" + child.name[1:])


def convert_files(enex_files_list: list, options: Type[Options]):
    """Convert enex file from the list into json files

    Args:
        enex_files_list (list): list of enex file to convert

    Returns:
        string: number of notes converted
    """
    
    log_debug(f"-----CONVERTING-----", logging.DEBUG)
    if not enex_files_list:
        log_debug("No file to convert.", logging.INFO)
        return
    
    source_folder = os.path.dirname(enex_files_list[0])
    if options.zip_result:
        working_folder = os.path.join(source_folder, "Working_folder")
    else:
        working_folder = os.path.join(source_folder, "Converted_files")
    os.makedirs(working_folder, exist_ok=True)
    files_dest_folder = os.path.join(working_folder, "files")
    
    # Add Relation "Evernote tag"
    dirname = os.path.dirname(__file__)
    relation_file = os.path.join(dirname, "models/Evernote_Tag_Relation.json")
    shutil.copy(relation_file,working_folder)
    
    nb_notes = 0
    for enex_file in enex_files_list:
        log_debug(f"Converting {os.path.basename(enex_file)}...", logging.INFO)
        with open(enex_file, 'r', encoding='utf-8') as xhtml_file:
            file_content = xhtml_file.read()
            if not file_content:
                log_debug(f"No content in file", logging.ERROR)
                return
        
        try:
            root = ET.fromstring(file_content)
        except ET.ParseError as e:
            log_debug(f"XML parsing error : {e}", logging.ERROR)
        except Exception as e:
            log_debug(f"XML treatment error : {e}", logging.ERROR)
        
        # is unique or multiple note?
        for note_xml in root.iter("note"):
            log_debug(f"Treatment note {nb_notes}...", logging.INFO)
            # Traitement des fichiers (base64 vers fichiers)
            files_dict = get_files(note_xml, files_dest_folder)
            
            # Utilisation de la classe Model.Page pour créer le JSON
            page_model: Model.Page = Model.Page()

            # Extraction du contenu de la balise <content> et traitement
            content_element = note_xml.find('content')
            if content_element is None or content_element.text is None:
                log_debug(f"Note {nb_notes} has no content!", logging.DEBUG)
                continue
            
            content: str = content_element.text
            process_content_to_json(content, page_model, files_dict)
            
            # Processing xml tags (other than <content>)
            process_details_to_json(note_xml, page_model, working_folder)

            # Nettoyer les clés "shifting" si nécessaire
            page_model.cleanup()

            # Générer le nom du fichier JSON en supprimant l'extension .enex
            # json_file_name = os.path.splitext(os.path.basename(enex_file))[0] + '.json'
            
            note_title = page_model.page_json["snapshot"]["data"]["details"]["name"]
            # Filename with the create date, in case several notes have the same title
            creation_date: str = page_model.get_creation_date()
            filename = f"{sanitize_filename(note_title)}_{creation_date}.json"
            with open(os.path.join(working_folder, filename), 'w', encoding='utf-8') as file:
                json.dump(page_model.to_json(), file, indent=2)
            nb_notes += 1
    
    # On zip le résultat
    if options.zip_result:
        log_debug(f"Create zip file", logging.DEBUG)
        current_time = datetime.now()
        zip_name = current_time.strftime("ConvertedFiles_%d%m%Y_%H%M%S")
        zip_path = os.path.join(source_folder, zip_name)
        shutil.make_archive(zip_path, 'zip', working_folder)
        shutil.rmtree(working_folder)

    log_debug(f"Conversion completed: {nb_notes} notes converted", logging.INFO)
    return nb_notes

def main():
    # pdb.set_trace()
    
    # Répertoire contenant les fichiers enex de test
    # enex_directory = 'Tests/Temp/'
    enex_directory = '.'
    # enex_files = [os.path.join(enex_directory, f) for f in os.listdir(enex_directory) if f.endswith('Carnet export test 2.enex')]
    enex_files = [os.path.join(enex_directory, f) for f in os.listdir(enex_directory) if f.endswith('.enex')]
    
    parser = argparse.ArgumentParser(description="Convert ENEX files.")
    parser.add_argument("--enex_files", nargs="+", help="List of ENEX files to convert", default=enex_files)
    parser.add_argument("--zip", action="store_true", default=True, help="Create a zip file")
    parser.add_argument("--debug", action="store_true", default=False, help="Create a debug file")
    

    args = parser.parse_args()
    
    # my_options.tag = "Valeur pour le tag"
    # my_options.import_notebook_name = args.zip
    my_options.is_debug = args.debug #args.debug
    my_options.zip_result = args.zip
    if args.enex_files: # dev mode
        enex_files = args.enex_files
        my_options.is_debug = True
        my_options.zip_result = False
    
    log_debug(f"Launched with CLI", logging.DEBUG)
    # Liste des fichiers enex dans le répertoire
    convert_files(enex_files, my_options)

    

if __name__ == "__main__":
    main()

Shampra · March 26, 2024, 8:02am

Ouf!
I’m looking for a solution but haven’t found anything yet. I’ve also been in touch with the Anytype team, but no leads there either. For these notes, the “least worst” would be to export/import as HTML (but you lose all the tags).

And yes, it’s a pain to see that Evernote has changed the way it codes each element over time!
ah, the pleasure of seeing that the same already coded element is different in another note…
For my part, my notes also date back to 2012 (maybe not in my test set, I’ll check) and almost all with the Legacy version. I don’t like the latest versions, so I went through them to make an “up-to-date” converter.

Thanks for your fixes , I’ll take a look and integrate them as soon as I have time.
I’m not a pro developer either so any help is greatly appreciated!

You should update it, some of your problems were already corrected (e.g. max filename length).
The last 2 versions add several fixes, choose between command line with parameters or graphical interface, files import.

UBr · March 26, 2024, 9:16am

Thank you for your swift response, dear @Shampra. Yes, I was super impressed how you’re built what it seems like an entire browser, in order to interpret the ENEX files’ internal HTML code, and render it to JSON.

While I was debugging the exceptions I found, I was wondering if there wasn’t an existing library somewhere to convert shabby HTML straight into clean Markdown, and try to take that to Anytype’s JSON format. Or alternatively, to rely on Anytype’s upcoming Web clipper to do that work also on Evernote’s internal ENEX code.

But then we’d still have to deal with Evernote’s own HTML style, and to preserve things like color highlighting (which I’ve been using a lot in Evernote).

Thank you, this is so generous. But no rush. At least on my end, I’m super glad that I found the vzhd1701/evernote-backup tool to rescue alle my notes from Evernotes’ servers into local ENEX files before they shut it all down completely, as the new owners seemed to choke off users’ access more and more now, and rather train their AI with our data. So knowing my stuff is here gave me some rest

Shampra · March 26, 2024, 10:59am

Whaaaat?
Okay, more robust code is always good, but I’ve never seen an rgb with this, it’s weird!

If you have if you have some time, I’d like to see the enex code that generated the error.
A priori it would be a string like 0.000000% into a rgb(...) in the enex file.
If you can find it and give me the entire div element (without the content if sensible).
Thank you

For the other issues, either they had already been fixed or I opened tickets on Github .

UBr · March 28, 2024, 7:41pm

The unusual color encoding with 0.000000% has been in content I had copied and pasted from a website in 2019. I guess that Evernote simply imported the formatting without sanitizing or simplifying it — it’s not critical of course, but it stopped the conversion process:

    <content>
      <![CDATA[<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd"><en-note><div>Lorem ipsum this is my content.</div><div><br /></div><hr /><div title="Page 1"></div><div style="margin-top: 1em; margin-bottom: 1em;" title="Page 1"><span style="font-size: 8.000000pt; font-family: 'Verdana'; color: rgb(0.000000%, 0.000000%, 100.000000%);-en-paragraph:true;" title="Page 1"></span><span style="font-size: 10.000000pt; font-family: 'Verdana,Bold';-en-paragraph:true;" title="Page 1">Lorem ipspm this is pasted content from a web page</span></div>

Much worse is another example that not only loses content — immediately after the <en-note> tag, and also e.g. after <ul><li> where there’s content that’s not enclosed in <div>, a pattern I found very frequently in ENEX files.

But here, even more odd, converter.py (or Anytype?!) assigns <a> hyperlinks to the wrong words in the text, totally totally weird.

This is also a very frequent pattern, not only in pasted web content. Check this out, it’s the original ENEX file (where I’ve simply replaced all words between tags with ciphers), you see the huge difference when you import it to Evernote:

<!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export3.dtd">
<en-export export-date="20240328T122730Z" application="Evernote" version="10.10.5">  <note>
    <title>Testfile 4 - random formatting, Hyperlink wrongly assigned.enex</title>
    <created>20150606T144534Z</created>
    <updated>20150606T144534Z</updated>
    <tag>1111</tag>
    <tag>1111111</tag>
    <tag>1111111</tag>
    <tag>1111111</tag>
    <tag>11111111</tag>
    <tag>111</tag>
    <tag>111</tag>
    <tag>1111111</tag>
    <note-attributes>
      <source>1111111</source>
      <source-url>111111111111111111111111111111111111111111111111111111111111</source-url>
    </note-attributes>
    <content>
      <![CDATA[<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd"><en-note>111111111 222 33333333 44 555 6666 77 8888888888888 99999 000000 111 22 3333333 444 555555 6666 77 8888888 99 0000 111 222 33333 44444<hr/><br/><div><div><div><div><h1><div></div><div><span>111 2222222222 33 4444 555555 6666</span></div></h1></div>
<div>
<strong>
1111111
</strong>
1

3


666666666 777777777 88
99 000000000

<a href="http://www.ted.com/talks/dan_gilbert_you_are_always_changing/transcript?language=en" target="_blank">

<span>1111 22222222222 3333333333</span>
</a>
</div>
























<p>
11111111111 222222 333 44444 55 66666666 7777 8888888888 99999 000000000000 111111111111111 222 3333333 444444 555555 66666666 77 8 9999999999 00 11111 222 333333333 44 5555555 666666666666666 77777 88 9999999 0000000 1111 222 333333 44 555 66666 777 88 999 000000 1111111111 22 333 444 5555 66 77777 88888 99999999999 000 111 22222
</p>
<div>
<ul>
<li>
<a href="http://www.ted.com/talks/dan_gilbert_you_are_always_changing/transcript?language=en" target="_blank">11111111111 222222222222222222222 3333333333</a>
</li>
</ul>
</div>
<div>
<div>

<div>
<div>
<a href="http://www.ted.com/speakers/dan_gilbert" target="_blank">111 2222222</a>
</div>
<div>
1111111111111 222222222 333333
</div>
<div>
1111111 222222222222 333 4444444 5555 666 7777777 88888 9999 0000 1111 22 33333 444 55555 66666 7 8 9999999 00 11111111 2222 3333333333 444444444 555 66666666 77 888 9999999999 000 111111111111 22222 33333 444444444 55 6666666666
<a href="http://www.ted.com/speakers/dan_gilbert" target="_blank">1111 222</a>
</div>
</div>
</div>
</div>


<div>
1111 2222 333 444444444 55 66 77777777 888 99999999999 000 111 22222222 33 444 5555555 66 777 8888 99999
</div>

</div></div></div><br/></en-note>]]>
    </content>
  </note>
</en-export>

Originally it’s meant to look like this:

and imported, it looks like this — notice the randomly assigned hyperlinks, you see it much clearer in the source code:

Shampra · March 29, 2024, 4:12pm

…

More recent notes are better exported, in this one the xml code is… not very clean?
I’ve listed the points to look at, so I’ve got plenty to keep me busy!

You can follow it here :

github.com/Shampra/EvernoteToAnytype

(From UBr) rather special html badly converted

opened 03:54PM - 29 Mar 24 UTC

Shampra

bug

``` <?xml version="1.0" encoding="UTF-8" standalone="no"?><!DOCTYPE en-note SYS…TEM "http://xml.evernote.com/pub/enml2.dtd"> <en-note>111111111 222 33333333 44 555 6666 77 8888888888888 99999 000000 111 22 3333333 444 555555 6666 77 8888888 99 0000 111 222 33333 44444<hr/> <br/> <div> <div> <div> <div> <h1> <div></div> <div> <span>111 2222222222 33 4444 555555 6666</span> </div> </h1> </div> <div> <strong>1111111</strong>13666666666 777777777 8899 000000000<a href="http://www.ted.com/talks/dan_gilbert_you_are_always_changing/transcript?language=en" target="_blank"> <span>1111 22222222222 3333333333</span> </a> </div> <p>11111111111 222222 333 44444 55 66666666 7777 8888888888 99999 000000000000 111111111111111 222 3333333 444444 555555 66666666 77 8 9999999999 00 11111 222 333333333 44 5555555 666666666666666 77777 88 9999999 0000000 1111 222 333333 44 555 66666 777 88 999 000000 1111111111 22 333 444 5555 66 77777 88888 99999999999 000 111 22222</p> <div> <ul> <li> <a href="http://www.ted.com/talks/dan_gilbert_you_are_always_changing/transcript?language=en" target="_blank">11111111111 222222222222222222222 3333333333</a> </li> </ul> </div> <div> <div> <div> <div> <a href="http://www.ted.com/speakers/dan_gilbert" target="_blank">111 2222222</a> </div> <div>1111111111111 222222222 333333</div> <div>1111111 222222222222 333 4444444 5555 666 7777777 88888 9999 0000 1111 22 33333 444 55555 66666 7 8 9999999 00 11111111 2222 3333333333 444444444 555 66666666 77 888 9999999999 000 111111111111 22222 33333 444444444 55 6666666666<a href="http://www.ted.com/speakers/dan_gilbert" target="_blank">1111 222</a> </div> </div> </div> </div> <div>1111 2222 333 444444444 55 66 77777777 888 99999999999 000 111 22222222 33 444 5555555 66 777 8888 99999</div> </div> </div> </div> <br/> </en-note> ```

And a easier-to-understand version of your note
(with a first patch of texte ouside of any element)

UBr · March 30, 2024, 9:40pm

Thank you so much for your first fixes, @Shampra!

Re. the “anonymizing” of the test files, I wasn’t sure what would be a good way to keep the notes 1:1 the originals and make sure no personal data was published (including lots of personal data of other people in project notes etc). Therefore I refrained from lorem ipsom and devised the number counter

E.g. in the first line starting with <strong> after the <hr> and the headline —
1111111 1 3 666666666 777777777 88 99 000000000 1111 22222222222 3333333333
— had you noticed that in the converted version of this line, the original hyperlink 1111 22222222222 3333333333 was assigned to 666666666 7 when imported to Anytype?

I could not see that super strange bug anymore in your screenshot of the patched converter output… but I can’t tell either if it disappeared due to your patch, or due to your understandable effort to make the note better legible?

Shampra · March 30, 2024, 9:59pm

Surely (I hope) with current developments. I could retest with the original note after finish that, but it’s much easier to work with!
In fact, once published, you can test with the real note and tell me again .

A few points are problematic and will require more time, but it’s already much better…

Shampra · March 31, 2024, 4:17pm

Done in the next release

And… here we go!

V0.38.6 is here, with many many corrections and improvements, especially for html integration

UBr · April 1, 2024, 6:19pm

Hey that’s great news, dear @Shampra! Will be happy to run it again and feed back.

Shampra · April 19, 2024, 12:18pm

Hello!

A quick question for you, dear Evernote users who is thinking of migrating to Anytype one day.

Simple table on Anytype don’t currently support much. Remember to vote here.

But Evernote note conversion can keep one (and only one) item per cell.
An image, a checkbox, etc…
Cool! Except that everything else is lost.
An image and text? No way, the text is lost.
Two checkboxes? Not possible: this would result in one checkbox with the text of both (so the status of the second is lost).
A title and text? Not possible.

Another problem: Anytype doesn’t really handle this, so you won’t be able to edit an image or checkbox cell without losing this element.

In my opinion, it’s always better than importing nothing, but I’d like to hear your opinions too, to guide development in the best possible way.

What are your usercases with several elements in the same cell?
Do we keep the images, checkboxes etc. or do we throw everything away and keep only the text?

Shampra · April 20, 2024, 4:27pm

V0.38.8 is here
A near-final version, with a complete overhaul of table integration

for single image or checkbox, it is converted as such and importable into AT
for multiples checkbox, is transformed into text so as not to lose anything
for merged cells, no loss. Of course, merging is suppressed in Anytype (the content is duplicated in each of the cells).
and other stuff like keeping embed youtube, fixes, etc

Now, I’m waiting for a few bugs to be fixed on the Anytype side, and I’m providing version 1.0 with a few final fixes.

Mow · April 22, 2024, 6:44am

used to compare lists, I’ve shared an example here : Images or Blocks in Simple Tables - #5 by Shampra
note layout, and yes I have long lists (simple or checkbox) in some cells

I also have links and files in some cells, does it work?

I don’t like tables in Anytype. The aim is to quickly and easily make a comparative table, with formatting options that highlight the different points, and often several images.
Anytype is very limited and more laborious to use, so it’s not very appealing. Each time, I went back to Evernote!

Shampra · April 22, 2024, 7:11am

If the use in layout is too complex, unfortunately it won’t work (if the cell contains a rich page with images, paragraph, etc, it’ll be lost).
At least in Anytype, it’s possible to make columns, no need to cheat with tables!

Links are inline (basically the formatting of a part of a text) so no worries. It’s the blocks that aren’t supported: images, files, embed links, titles, separator lines, paragraphs, etc.
For information, as headings are formatting, this is replaced by bolding the text.

For files… I have to test if Anytype can support it if we “force” an import of a cell with a file inside .

If you can test any of your tables and tell me if it works (or what’s wrong), thank you.

jxelam · May 19, 2024, 1:44pm

Hi @Shampra, thanks for making this tool it looks extremely useful.

I’m using it to try and migrate several thousand notes across a few dozen notebooks. There’s some parsing issues that causes it to error out which I had to fix (I’ll do a PR at some point), but mostly it worked just fine.

Is there a way to name the collection? At the moment all my enex files are named after the notebook name, so ideally I’d like to carry that over.

Also I have nested notebooks (stacks in Evernote), and am wondering if there’s a way to replicate that with nested collections? I had quick look but couldn’t seen anything obvious in the any-block specs.

Finally, and this could just be me misunderstanding, but I noticed lots of the notes come through into the collection with inline attachments showing up as separate entries, seemingly as well as still being inline. Is there a way to get the default view so it’s just showing the actual notes?

I don’t know how much of that is possible, I know an API but not until Q4 and it’s not obvious to me what’s possible vs not in terms of the import format.

Shampra · May 20, 2024, 7:12am

Hey @jxelam

Anytype works differently from Evernote, so you need to rethink your organization. As long as the data is properly imported (don’t forget Anytype’s few limitations, for example on tables), everything’s fine, so all you have to do is play around with it to create an environment that suits you. This is the case for notebooks (names, nesting), which Evernote unfortunately doesn’t export anyway.

Anytype imports into a time-stamped collection (which I personally hate).

To reproduce an Evernote notebook, there are several solutions. Here’s a guide to one of them:

Rename the collection with your nokebook name (click to the title)

image1026×282 8.29 KB
To display only notes and not files, filter : New filter > Change “Name” to “Object type” > Change “All” to “Has any of” > Add > Page

image1046×234 10.5 KB
I also recommend modifying the relations displayed (Description isn’t used, you might as well remove it and display Evernote Tags).

image1038×276 18.6 KB
Now for a nested notebook, create a collection with the root notebook name “Notebook stack”.
To add collection in it, Anytype has a real problem because it doesn’t clearly allow it directly…
So, go back to you Notebook “My note”,

image1051×260 15.3 KB
Go back to your “stack” collection, it’s done!

image466×340 4.86 KB
Last step : create a widget with your stack collection"

Check :

image936×335 12.8 KB

If you want exact same thing than Evernote, Evernote has “Notebooks” > “Notebook stack” > “My notebook”… yes, another level of nesting is needed, a “Notebooks” collection needs to be remade and the manipulation redone to put in what you want.
But Anytype can’t display nested collection in the menu
Edit : Oh, you can do it with simple page. I’ll update later

Don’t hesitate to test to find out what suits you best!
I’ll make a separate post for “recreating Evernote in Anytype”, so it’ll be more visible than here.

And don’t hesitate to point out what’s missing to make Anytype as efficient as Evernote, or to vote for the proposals that interest you! There seem to be quite a few users migrating, but very few participating (compared to Notion users).

Better simple table :

Tag view in the sidebar menu :

Batch editing relations (like tags) :

etc

Dedicated post for this tutorial :

jxelam · May 20, 2024, 10:11am

Thank you for the very informative reply!

I’m also a Notion user, and I imagine the higher engagement probably stems from the fact there’s a built in import from Notion which is more fully featured and so there’s less friction.

It sounds like I need to investigate the any-block format a bit more and see if I can modify your import tool to better represent the relationships out of the box to match the sort of setup you kindly outlined.