存储库/资源库 (Repository)#

除了 database ，AiiDA还以文件形式在 资源库 中存储信息。存储库经过优化，可以存储大量文件，这使得AiiDA可以扩展到高吞吐量负载。因此，尽管文件存储在本地文件系统的某处，但不能直接使用文件系统工具访问。相反，你应该通过应用程序接口与资源库交互。

由于每个 node 都可以拥有自己的虚拟文件层次结构，因此 node 的存储库内容可通过 Node 类访问。层次结构是虚拟的，因为文件实际上可能不会以相同的层次结构写入磁盘。有关实现的更多技术信息，请参阅 repository internals section 。

写入存储库#

要将文件写入 node，可以使用以下三种方法之一：

假设本地文件系统中有一个名为 /some/path/file.txt 的文件，要将其复制到 node 中。最直接的解决方案如下：

node = Node()
node.put_object_from_file('/some/path/file.txt', 'file.txt')

请注意，第一个参数应为绝对文件路径。第二个参数是将文件写入 node 存储库的文件名。只要是相对文件名，可以是任何有效的文件名。目标文件名可以包含嵌套子目录，例如 some/relative/path/file.txt。嵌套目录不必存在。

另外，也可以通过流或类文件对象将文件写入 node。当文件内容已在内存中时，这种方法非常有用，可以避免先将文件写入本地文件系统。例如，我们可以这样做

with open('/some/path/file.txt') as handle:
    node = Node()
    node.put_object_from_filelike(handle, 'file.txt')

与上例相同，只是先在上下文管理器中打开文件，然后再传入类文件对象。 put_object_from_filelike() 方法适用于任何类文件对象，例如字节流和文本流：

import io
node = Node()
node.put_object_from_filelike(io.BytesIO(b'some content'), 'file.txt')

最后，您可以将整个目录的内容写入 node 的存储库，而不是一次写入一个文件：

node = Node()
node.put_object_from_tree('/some/directory')

整个目录的内容将递归写入 node 的版本库。您也可以选择将内容写入版本库中的子目录：

node = Node()
node.put_object_from_tree('/some/directory', 'some/sub/path')

与 put_object_from_file() 一样，无需首先明确创建子目录。

列出存储库内容#

要确定 node 存储库的内容，可以使用以下方法：

第一个方法将返回 node 资源库中的文件对象列表，对象可以是目录或文件：

In [1]: node.list_object_names()
Out[1]: ['sub', 'file.txt']

要确定子目录的内容，只需将路径作为参数传递即可：

In [1]: node.list_object_names('sub/directory')
Out[1]: ['nested.txt']

请注意，返回列表中的元素是简单字符串，因此无法判断它们对应的是目录还是文件。如果需要此信息，请使用 list_objects() 代替。该方法将返回一个包含 File 对象的列表。这些对象有一个 file_type() 和 name() 属性，分别返回文件对象的类型和名称。使用示例如下

from aiida.repository.common import FileType

for obj in node.list_objects():
    if obj.file_type == FileType.DIRECTORY:
        print(f'{obj.name} is a directory.)
    elif obj.file_type == FileType.FILE:
        print(f'{obj.name} is a file.)

要检索具有特定相对路径的特定文件对象，请使用 get_object() ：

In [1]: node.get_object('sub/directory/nested.txt')
Out[1]: File(file_type=FileType.FILE, name='nested.txt')

最后，如果要递归遍历 node 资源库的内容，可以使用 walk() 方法。它的操作方法与 os.walk method of the Python standard library 方法完全相同：

In [1]: for root, dirnames, filenames in node.walk():
            print(root, dirnames, filenames)
Out[1]: '.', ['sub'], ['file.txt']
        'sub', ['directory'], []
        'sub/directory', [], ['nested.txt']

从 Repository 中读取#

要检索存储在 node 资源库中的文件内容，可以使用以下方法：

第一种方法的功能与 Python 的 open 内置函数完全相同：

with node.open('some/file.txt', 'r') as handle:
    content = handle.read()

如果您想直接将内容读入内存， get_object_content() 方法为这一操作提供了捷径：

content node.get_object_content('some/file.txt', 'r')

这两种方法都接受第二个参数，以确定文件是以文本模式还是二进制模式打开。有效值分别为 'r' 和 'rb' 。请注意，这些方法只能用于从版本库读取内容，因此任何其他读取模式(如 'wb' )都会导致异常。要将文件写入版本库，请使用 writing to the repository 章节中描述的方法。

从Repository复制#

如果要复制 node 资源库中的特定文件，有关 reading from the repository 的章节介绍了如何读取这些文件的内容，然后将其写入其他地方。不过，有时你想复制 node 资源库的全部内容，或其中的一个子目录。 copy_tree() 方法可以轻松实现这一目的，其使用方法如下：

node.copy_tree('/some/target/directory')

这将把 node 的整个版本库内容写入本地文件系统的 /some/target/directory 目录。如果只想复制版本库的某个子目录，可以将其作为第二个 path 参数传递：

node.copy_tree('/some/target/directory', path='sub/directory')

这种方法与 put_object_from_tree() 结合使用，可以轻松地将整个资源库内容(或子目录)从一个 node 复制到另一个 node 中：

import tempfile
node_source = load_node(<PK>)
node_target = Node()

with tempfile.TemporaryDirectory() as dirpath:
    node_source.copy_tree(dirpath)
    node_target.put_object_from_tree(dirpath)

请注意，这种方法并不是最有效的，因为文件首先要从 node_a 写入磁盘上的临时目录，然后再从内存中读取并写入 node_b 的存储库。还有一种更高效的方法，需要更多代码，它直接使用 listing repository content 章节中解释的 walk() 方法。

node_source = load_node(<PK>)
node_target = Node()

for root, dirnames, filenames in node_source.walk():
    for filename in filenames:
        filepath = root / filename
        with node_source.open(filepath) as handle:
            node_target.put_object_from_filelike(handle, filepath)

备注

在上面的示例中，只有文件被明确复制过来。任何中间嵌套目录都将在虚拟层次结构中自动创建。不过，目前还无法明确创建目录。目前还不支持空目录。