❓Node FAQ

Topology

(See Figure 1 for topology overview, TBD)

API User:

Standard S3 commands, aiming for similar coverage to our Flostream service.

Load Balancer:

Planning for redundant gateways per cell to enhance scalability and resilience. The load balancer assigns a gateway for each session.

Gateway Functionality:

  • Control Plane: Incorporates cell_control functionality, transparent to the existing flo.stream backend.

  • Metadata Store: Utilizes an external service for resilience and support of multiple gateways per cell.

  • Untrusted Storage: Adjustments in operation due to the inclusion of many untrusted storage nodes.

Storage Nodes:

With untrusted nodes, our approach deviates from the following style architecture:

  • No Trust: Continuous validation of storage node presence and data integrity.

  • No Metadata: Nodes don’t access cell metadata; we rely on directory patterns and command states.

  • Full File Storage: Chosen for reasons including reduced metadata and potentially faster transfers.

  • Encryption: Mandatory for data in transit and at rest, given the untrusted nature of nodes.

Operation

User, key additions, and other operations remain unchanged, appearing as additional cells to the web backend, with communication mirroring that of current flo.stream protocols.

S3 API Experience

Aiming for a user experience similar to the current service

Monitoring, Logs

Similar to current gateway functions in flo.stream, with Flostream appearing as additional cells.

Gateway Maintenance

Simplified modifications or replacements due to external metadata storage.

Adding Storage Nodes

Involves a process starting from the web front end, culminating in database information storage. This includes generating unique node IDs and informing nodes of their IDs and gateway addresses, likely through a config file. The node sends a β€˜hello’ message to the gateway upon startup, which includes its API key and a unique security token. Future communications require matching security tokens.

FAQ

1. How does the gateway choose storage nodes for a new file?

The gateway will use a scoring system to choose nodes, probably including the follow- ing parameters:

Trusted or untrusted

  • Trusted or untrusted

  • Available storage

  • Ping time

  • Reputation: Seniority. etc.

2. How will node usage be tracked and reported?

We expect to collect this information and write it to the database in much the same manner as we currently do for bucket storage and egress monitoring. Information would be keyed to nodeid and include:

β€’ Storage used per day.

β€’ Bytes of ingress and egress for that node.

β€’ Minutes of availability.

β€’ Current reputation score.

The business details can be handled by the web back-end.

3. Currently on Flostream, metadata needs to be stored locally on a very fast disk. How could Flosteam move metadata to an external service without affecting speed?

The main enabler for this was to eliminate most of the metadata. This was made possible by moving to full-file storage. Flostream metadata needed to track every block of every file which really explodes the metadata.

Another way we retain speed is by caching frequently used metadata on each gateway machine.

4. How can you be sure an untrusted storage node has not altered a file?

We already need to periodically ping each storage node to see if it is still present on the network. We will integrate some testing into that process:

Our current plan is to quiz storage nodes about file contents as part of the ping process. During the initial transfer we will hash random byte ranges of the file being transferred and store this information in metadata. When we ping a node, we will ask it to return a hash of one of these ranges and ensure that it matches our metadata. We will also ensure that whole-file hashes are performed during reads and that they match what we have stored in metadata. We will also experiment with zero knowledge proofs on how to optimize this workflow.

5. What happens when a storage node becomes unreachable?

Here is our current plan, but it is changeable.

Short term For short outages, on the order of 15 minutes or so:

β€’ The node is marked missing and pings continue.

β€’ No replications are started.

β€’ Reconnection is possible, from the same IP address. We ask the node to send a new hello message before marking it active.

Long term For longer outages the node is marked failed:

β€’ No pings.

β€’ Files assigned to that node are replicated to different nodes.

β€’ No reconnection is possible. If that node later sends a hello message an error will be returned.

6. Can a storage node pose as a different node?

We currently rely on testing the security token nodes send with every message to the gateway, and our record of the node’s last known IP address. To pose as another, the following needs to happen:

β€’ The legal node must be marked missing and not responding to pings.

β€’ The imposter performs a hello using the node id and security token of the legal node from the last node’s last known address.

An imposter node can become the node of record if it is first to send a hello message for a given node id.

7. Can a storage node change its IP address?

The simple security mentioned above doesn’t include this possibility, but it sounds like a possible use case. One way to enable this could involve some user action in the web GUI and the database, perhaps to loosen the requirement that a missing node reconnect using its last known address.

Last updated